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Abstract We study a class of dynamic games with a continuum of atomless players where 
each player controls a semi-Markov process of individual states, while the global state of the 
game is the aggregation of individual states of all the players. The model differs from standard 
models of dynamic games with continuum of players known as mean field or anonymous 
games in that the moments when the decisions are made are discrete, but different for each 
of the players. As a result, the individual states of each player follow a continuous time 
Markov chain, but the global state follows an ordinary differential equation. Games of this 
type were introduced by Gomes et al. (Appl Math Optim 68:99-143, 2013) and received 
some attention in the literature in last few years. In our paper we introduce a novel model 
of this type where players maximize their cumulative payoffs over their lifetime. We show 
that the payoffs of the players using any stationary strategy of a certain class in a game with 
continuum of players are close to those obtained in n-person counterparts of this game for 
n large enough. This implies that equilibrium strategies in the anonymous model can well 
approximate equilibria in related games with large finite number of players. In the rest of 
the paper we concentrate on a subclass of games where the payoff and transition probability 
functions exhibit some strategic complementarities between players. In that case we prove 
that the game possesses a stationary equilibrium. Moreover, largest and smallest equilibrium 
strategies are nondecreasing in the states. It also turns out that these equilibria can be well 
approximated using a distributed iterative procedure. 
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1 Introduction 


Games with infinitely many atomless players have since long ago been used both in engi- 
neering and economics to model strategic interaction between large number of players, when 
the influence of an individual on the outcome of the game becomes negligible. Since the 
pioneering papers of Schmeidler [31] and Wardrop [38], they have become an important tool 
in modelling competitive markets, stock exchange and exploitation of common resources on 
one side, and network congestion or power control on the other. 

Dynamic games of this type have been introduced in a paper by Jovanovic and Rosenthal 
[19]. In their framework each of the players controls an individual discrete-time Markov 
chain, while the global state of the game, defined as a probability distribution of individual 
states of all the players, becomes deterministic. The reward of an individual is then computed 
as the expectation of discounted sum of utilities obtained by him in infinitely many stages of 
the game. Some generalizations of their model were provided in [1,5-7]. The extension of 
their model to cover other utility criteria such as expected average utility and expected total 
utility was provided in [39]. 

Another important class of dynamic games with continuum of players has been introduced 
independently by Lasry and Lions [23-26] and by Huang et al. [16—18]. In their model the 
time is continuous, and so the evolution of both the individual and the global state of the game 
are described by ordinary differential equations. One can view their model as a generalization 
of differential games to games with continuum of players, while that of [19] as an extension of 
Markov or stochastic games to games with infinitely many players. The papers of Lasry and 
Lions have made an important impact on the entire game-theoretic community, additionally 
providing the name which is now commonly used to describe games of both types—‘‘mean- 
field games”. An overview of the state of the art in mean-field game theory can be found 
in [11], [1] includes a review of applications of mean-field games in economics, while [35] 
takes a look at those in engineering. 

In this paper we concentrate on an intermediate concept, linking some features of mean- 
field games 4 Lasry and Lions and anonymous games of Jovanovic and Rosenthal. In our 
model the moments when the decisions are made are discrete, but follow separate controlled 
continuous time Markov chains, each controlled by a different player. As a result, these 
moments are different for each of the players—the process of individual states for each 
is a continuous time Markov chain, but the global state is, as in other mean-field game 
models, deterministic—following an ordinary differential equation. Model of this type has 
first appeared in the literature in a seminal paper of Gomes et al. [10] where characterization 
in terms of differential equations and main properties of this model were provided, together 
with a result on the convergence of n-person counterparts of this game to mean field limit. 
Further results of this type were provided in [9]. Some particular cases or applications of 
this type of games were also studied in [13, 14,20,21,40]. In the paper we introduce a novel 
model of games of this type where players, instead of maximizing some payoff accumulated 
over the entire game, maximize the reward obtained during their lifetime, which may be 
different for different players. We assume that a dead player can be replaced after some 
time by a newborn one, and thus after some time we can obtain stationary behavior of the 
system which is then used to define a mean-field-type equilibrium. In the first part of the 
paper we give some sufficient conditions for these games to possess equilibria. These are of 
strategic complementarity type and are inspired by a paper on Markov-type discounted mean- 
field games [1]. Further, we show that the payoffs of the players using any given stationary 
strategy of a certain class in a semi-Markov mean-field game are close to those obtained in 
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its n-person counterparts for n large enough. This implies that equilibrium strategies in the 
anonymous model can well approximate equilibria in related games with large finite number 
of players. 

The organization of the paper is as follows: In Sect. 2 we present the general framework 
we are going to work with and define what kind of solutions we will be looking for. In 
Sect. 3 we present our main results about the existence of equilibrium in games with strategic 
complementarities and convergence to equilibrium of a simple learning procedure, followed 
by some examples of applications of our model. Section 4 contains results linking mean-field 
game model presented earlier with games with large finite number of players. It is followed 
by conclusions in Sect. 5. 


2 The Model 


In this section, we formally describe the game model and the solution we will analyze in the 
remainder of the paper. 
The semi-Markov mean-field game with total reward is described by the following objects: 


— The game is played by an infinite number (continuum) of players. Each player has his 
own private state s € S, changing over time. We assume that S is a finite set. We assume 
that there exists an element! so standing for “death” of a player. Any player in state so has 
no choice of action to play and receives no rewards. Moreover, his reward is computed 
over his “lifetime”, that is, from one visit in state sọ to his next visit there. 

— The global state of the system at time f, X; is a probability measure over S. It describes 
the mass of the population, which is at time ż in each of the individual states. The set of 
global states of the game is thus? A(S). We assume that any player œ has an ability to 
observe the global state of the game, so from his point of view the state of the game at 
time ¢ is (s”, X;) E€ S x A(S). 

— We assume that the time is continuous, but the individual state of player œ can only change 
at specific times Tọ, TF, ..., where Ty’ = 0. The time between successive transitions 
TE = T,, — Te is random exponentially distributed with intensity A(sra_, Xre). t are 
for different k and w independent random variables. A is a positive, Lipschitz-continuous 
function of the global state of the game. 

— The set of actions available to a player in state (s, X) is a nonempty set A(s, X), with 
A := Us, xesx A(s) A(s, X)—a finite set. We assume that the mapping A is an upper 
semicontinuous function. We also assume that any player in state so plays some default 
action do, not available in any other individual state. 

Let D denote the set of feasible state-action vectors, that is 


D := {(s, X,a) € S x A(S) x A: ae A(s, X)}. 


— The transition for player @ at time T% , is according to the transition function q : D > 
A(S) which is a Lipschitz-continuous function of the global state. q (- [sro ,X Te, are) 
denotes the distribution of the individual state of player «œ after jump he makes at time 
Tę, given his previous state sy2_, his action ara and the state-action distribution of all 
the players at time 7,*. In particular, a player in state sọ can join the game (be reborn) at 
time T in state s with probability g(s|so, Xr, ao). 


1 We can assume there is a whole subset of such elements. 


2 Here and in the sequel for any finite set B A(B) denotes the set of all the probability measures over B. 
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— We assume that all the players use stationary strategies, that is, they choose their actions 
depending only on their current individual state and current global state. Thus any strategy 
f is a Borel-measurable function from S x A(S) to A such that for any s € S and 
X € A(S), f(s, X) € A(s, X). The set of all stationary strategies will be denoted by F. 

— The changes in individual states are aggregated according to the Kurtz (see Theorem 5.3 
in [32]) dynamics: 


XS = DXF A, Kass", Xi, a) Fals’, XO — XI, X), seS (0) 


s'eSacA 


with Xo = xo, the initial global state, where f denotes the average stationary policy used 
by the players. This average can be defined if the function f(s, X) is jointly measurable 
in (a, X) by the following equality 


1 
fals, X) =f L{ f%(s, X) = a} da, 
JO 


where f“ is the stationary strategy of player œ. As we will see, in all our considerations, 
this will be a.e. a constant function of a, so the joint measurability will be immediately 
implied by measurability w.r.t. X. In the sequel, we will write X, (f) for the global state 
satisfying (1) when average stationary strategy is f. 

— Given the evolution of the global state, which depends on the strategies of the players ina 
deterministic manner, we can define the individual history of player «œ as the sequence of 
his consecutive individual states, actions and sojourn times h = (ce T> ape. Spas a she 
By the Ionescu-Tulcea theorem (see Chap. 7 in [4]), for any stationary strategy f of 
player «œ and any initial individual state distribution jz9, there exists a unique probability 
measure Py o on the set of all infinite histories of the game H = (S x R? x A)™ 
endowed with Borel o-algebra consistent with f, q and uo. Then the individual a’s 
expected total reward is defined as the integral of his immediate (per unit time) reward 
function r : D — R over his lifetime, plus the sum of rewards received upon the change 
of state awarded according to the function F : D —> R, which can be written as 


ie—1 


TH 
JCF, 3, oo) = EP bs (re. X70(B), aga) + Í rhe, X1@ aga) “| 


i=0 
(2) 
where 7;, is the moment of his first return to sg and jo is the initial distribution of all 
the new-born players. We assume both r and F are continuous in the global state of the 
game. 


Since the game is symmetric, the equilibrium can be defined in the following manner. 
A stationary strategy f and a measure u € A(S) are in equilibrium in the semi-Markov 
mean-field game with total reward if Xo = u implies X‘ (F) = u for every t > 0 and for 
every other stationary strategy g € F, 


JCF, F, P) > J(e, F, p), 


where p = q(-|so, u, ao) is the distribution of individual states of new-born players when 
global state is ju. 
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3 Game with Strategic Complementarities 


In this section, we present the results about the existence of and convergence to equilibrium 
in our game for the model under some lattice-theoretic assumptions. Since the reader may be 
unfamiliar with lattice theory, below we present a brief introduction to it with all the notions 
used in the remainder of the paper. Those interested in deepening their knowledge about this 
subject are referred to [36], where concepts of lattices and supermodularity together with 
their applications to decision and game theory are discussed in detail. 


3.1 Lattice-Theoretic Preliminaries 


Let B be a partially ordered set with order <. An element b € B is called an upper bound of 
C C B ifb > c for every c € C. Similarly, b is a lower bound of C if b < c for all c € C. 
We say that b is a supremum or a least upper bound of C in B if it is an upper bound of C 
and b > b’ for any other upper bound of C, b’. Similarly a least lower bound or an infimum 
is defined. We say that B is a lattice if for every b, b' € B, sup{b, b'}, inf{b, b’} exist in B. 
We say that it is a complete lattice if for every nonempty C C B, sup{C}, inf{C} exist in B. 

Many commonly used partially ordered sets are lattices. For example R is a lattice with 
usual ordering as well as any R” with componentwise ordering.’ None of them is a complete 
lattice though. Compact intervals of R” are simple examples of complete lattices. A lattice 
which will be of particular interest to us is that of Borel probability measures on R, A(R), 
with (first order) stochastic dominance ordering <sp defined as follows: 


Peay es [eran < [ «on 


for any nondecreasing bounded measurable function g : R —> R.‘ It is well known that 
P <sp Q is equivalent to Fp(x) > Fo(x) for any x € R, where Fp and Fo are cumulative 
distribution functions corresponding to P and Q respectively. Again, A(R) is not a complete 
lattice, but for any compact subset B of R, A(B) is complete. It has been shown in [29] that 
the same is not true already for R?. There, even the set of probability measures defined on 
the set {(0, 0), (0, 1), (1, 0), (1, 1)} with stochastic dominance ordering is not a lattice, so 
any results basing on the lattice structure of A(R) cannot be directly repeated for A(R”), 
n>2. 

Let B bea lattice. A function f : B —> Risnondecreasing if b < b' implies f(b) < f(b’). 
f is supermodular if f (sup{b, b'}) + f (inf{b, b'}) = f(b) + f(b"). If C is also a lattice, we 
say that a function f : B x C > R has increasing differences in b and c if b > b', c = c 
implies f(b,c) — f(b’,c) > f(b’, c) — f(b’, c’). Finally, a correspondence T : B —> C 
is nondecreasing if for any b < b' and c € T(b), c’ € T(b'), inf{c,c'} € T(b) and 
sup{c, c’} € T(b’). If, instead of real-valued functions f we consider a function whose 
values are probability measures on R (a parametrized measure) with stochastic dominance 
ordering, we use terms stochastically nondecreasing and stochastically supermodular for 
the counterparts of the above properties. We say that a parametrized measure f(-|b, c) has 
stochastically increasing differences if Jo g(a) f(daļ|b, c) has increasing differences for any 
nondecreasing bounded measurable function g. 


3 In the remainder of the paper we will use symbol < for ordering in R, while < will be used to denote 
componentwise ordering in R”. 


4 The symbol <sp will be used throughout the paper to denote stochastic dominance ordering. 
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3.2 Assumptions 


Below we present the set of assumptions for the model considered in our paper. These 
assumptions (except the first one) are not necessary for the model to make sense, but will be 
used either to prove the existence of equilibria there or in some further results. 


(A1) There exists a po > 0 such that for any fixed global state u and under any stationary 
policy f the probability of getting from any state s € S \ {so} to so in |S] — 1 steps 
is not smaller than po. 

(A2) S and A are sublattices? of R with so = min{S} and aj = min{A} and for anys € S$ 
and X € A(S), A(s, X) is a sublattice of A. Moreover, A(s, X) is nondecreasing in 
(s, X). 

(A3) r(s, X, a) and F(s, X, a) are nonnegative nondecreasing in s and supermodular in 
(s, a). Moreover, they have increasing differences in (s, a) and X. 

(A4) q(-|s, X, a) is stochastically supermodular in (s, a) and stochastically nondecreasing 
in s, a and X. Moreover, it has stochastically increasing differences in (s, a) and X. 

(A5) à(s, X) does not depend on s and is nonincreasing in X. 

(A6) The value of A(s, X) does not depend on X 5 


Remark 1 Note that some of the above assumptions can be slightly relaxed if, instead of 
considering each of the functions defining the game separately, some combinations of them 
were characterized. In particular, assumptions (A3) and (A5) could be relaxed, if we did not 
assume the positivity, monotonicity and supermodularity of each of r, 7 and A7!, but rather 
assumed that F(s, X, a) + Pees is nonnegative nondecreasing in s, supermodular in (s, a) 
and having increasing differences in (s, a) and X. 


Remark 2 The assumption (A3) can be slightly generalized by considering the reward func- 
tions r and F depending not only on the individual state s and the action a of a given player 
and the global state X but also on the global distribution of actions that we can denote as Z. 
Then we could assume the following: 


(A3’) r(s, X,a, Y) and F(s, X,a, Y) are nonnegative nondecreasing in s, supermodular in 
(s, a). Moreover, they have increasing differences in (s, a) and (X, Y). 


The proofs of Theorems 1 and 2 can be repeated when (A3) is replaced with (A3’) in the 
assumptions, although they become more complex notationally. 


Remark 3 Our supermodularity/increasing differences assumptions are closely related to 
the monotonicity assumptions used by Lasry and Lions [26] to establish the uniqueness of 
equilibrium solution in a mean-field game. The assumptions of this type have been extensively 
used in the mean-field game literature, also for games with finite state space [10]. The 
formulations of these assumptions may slightly differ depending on other assumptions that 
are made, but they all can be viewed as very close to requiring strictly increasing differences in 
individual and global states of some function related to the Hamiltonian corresponding to the 
immediate reward (cost) function (or the immediate reward itself, see e.g. [12]) as well as of 
the terminal reward (cost). In our assumptions we require weak supermodularity and weakly 
increasing (nondecreasing) differences of the functions defining our model. It is easy to see 
that in a degenerated case when each of the functions r, 7 and q is constant on S x A x A(S), 


5 Note that since they both are finite, they are clearly complete lattices. 


6 Alternatively we could write that the correspondence S is continuous, but, since the set A is finite, this 
reduces to this seemingly more restrictive assumption. 
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our assumptions will not be violated, while any of the monotonicity assumptions used in the 
literature will.’ It is natural, as we do not expect uniqueness of equilibrium in our model, 
but rather a special structure of equilibrium strategy set. Similarly, monotonic mean-field 
game models typically require some convexity assumptions to hold. In our case no convexity 
in any variable is assumed. On the other hand, apart from assuming increasing differences 
in individual and global state we make additional (weak) monotonicity assumptions about 
functions defining our model. 


Remark 4 There is no discounting in our model, as (A1) guarantees that the expected rewards 
for the players are bounded. Note however that adding discounting does not change our results, 
so if some real-life application requires adding it to the model (which is often the case in 
economics), one is free to do so. 


The assumptions of the strategic complementarity type have been used in the game- 
theoretic literature for a long time. A review of results for one-step games can be found 
in [36]. Some results about dynamic games with strategic complementarities can be found 
in [2,3,8,15,30,33,37]. A model of discounted dynamic games with continuum of players 
satisfying similar assumptions can be found in [1]. A general intuition about this type of 
conditions is the following: Strategic complementarity between some two quantities describes 
a situation when they mutually reinforce one another, that is an increase in one of them 
implies that it is profitable to increase the other one and vice versa. In dynamic games with 
complementarities we usually assume that strategic complementarity takes place between 
individual states of players, so an increase in one’s state makes increase in others’ state 
profitable. In addition, we usually assume (as we do here) that there is a complementarity 
between player’s actions and his states, so that an increase in the state makes higher actions 
more profitable. Finally, we also need to make some monotonicity assumptions about the 
immediate rewards and the transition law, which are crucial for the aggregate reward of a 
player to preserve the strategic complementarity of immediate reward functions. It turns out 
that many games possess this kind of properties, as seen in the example below. It should 
also be noted that many real-life applications can be modelled as total reward semi-Markov 
mean-field games with complementarities. Some of them are presented in Sect. 3.5. 


Example 1 While some of the assumptions (A1—A6) are rather clear, it may be difficult for 
those not familiar with theory of supermodular functions to see what kind of functions satisfy 
assumptions (A3) and (A4). Below we present some examples. Functions r and F satisfying 
(A3) can be of any of the following forms: 


a(s)B(a)E[y(X)], (3) 
min{a(s), B(a), E[y(X)]}, (4) 


where œ : S > R, £ : A— R, y : S — Rare any nonnegative nondecreasing functions. 
They can also be of the form 


cia(s) + c2 (a) + c3y (X) (5) 


where œ is a nonnegative nondecreasing function, 6 and y are any nonnegative functions 
of respective variables, while the constants c1, cz, c3 > 0. Finally, they can be any conical 


7 Ofcourse the same will be true if these functions are constant on some properly chosen subset of S x A(S) x A. 
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combination of functions of forms (3—5), as well as of a quadratic function of the formë 


—E[(B(a) — y(X))’], 


where $ and y are nondecreasing, provided it is nonnegative. 
An example of the transition law satisfying (A4) was given by Nowak [30]: 


q¢ls, X,a) = f(s, X,a)qls, X, a) + (1 = f(s, X, a))q2 (ls, X, a), 


where q1 >sp q2, While f : S x A(S) x A — [0, 1] is supermodular in (s, a) and nonde- 
creasing in s, a and X. Moreover, it has increasing differences in (s, a) and X. Such a function 
can be constructed as a conic combination of functions (3—5) under additional condition that 
all the functions a, 8, y are nondecreasing. 


3.3 Existence of Equilibrium 


Now we can formulate the main result of this section. 


Theorem 1 A semi-Markov mean-field game with total reward satisfying assumptions (A1l— 
AS) has an equilibrium (f*, u*) such that f* is nondecreasing in individual state and u. 


Many of the arguments used in the proof are taken from [1] where discrete-time discounted 
mean-field games with strategic complementarities were considered. Whenever some results 
appearing there can be used here in an unchanged form, we refer the reader to some specific 
results in that paper. To start with, we need to introduce for any fixed global state X an auxiliary 
dynamic optimization model M(X). Suppose an individual controls a discrete-time Markov 
decision process with total cost, with 


(a) the state space S and the action space A; 
(b) the initial distribution of states 4o; 
(c) the transition probabilities? 


q(-|s;, X, at) for any s; Æ so 
ô[so], for s; = so 


’ 


Qx (lst, ar) = | 


so sọ becomes now absorbing; 
(d) the reward per stage given by the equality 
r(s;, X, ar) 


R : =T7(s1, X, 
X (St, at) =T (St at) + KG, X) 


8 Tf we assume that the reward functions depend also on the distribution of actions among the players Z (see 
Remark 2), then we can also add the quadratic function of the form 


-E [6w - 72] 


multiplied by a positive constant. In case of quadratic functions that depend only on individual and global 
state, function of the form 


-E [e0 - yx)? | 


can only appear in a conic combination with some nonnegative nondecreasing function of s, such that the sum 
is also nonnegative and nondecreasing in s. 


9 Here and in the sequel ô[x] denotes a degenerate probability measure concentrated in point x. 
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Note that for any stationary strategy f, the reward received by the controller using f in this 
model equals the total reward (2) in case the global state induced by g is fixed and equal to 
X. Note also that this is a classic Markov decision process with total reward, as considered 
in the literature, and so standard dynamic programming arguments imply that: 


(a) Since assumptions (A1) and (A2) hold, the optimal value in this model is finite. 
(b) The optimal value in this model V% has to satisfy for any s € S the following Bellman 
equation: 
* = EFAN 1 
Vi(s) = max es (s,a) + >) VES Ox(6'ls, J (6) 
s'ES 
(c) A is finite, and thus compact, which implies that ‘sup’ in (6) can be replaced by ‘max’, 


moreover, optimal stationary strategies in M(X) exist and can be identified as any strate- 
gies maximizing the RHS of (6). 


In the first lemma we will show what are the main properties of Vy. 
Lemma 1 V¥(s) is nondecreasing in s and has increasing differences in s and X. 


Proof The proof is for most part the repeat of the arguments used in [1]. It will be broken 
into three claims. Before we formulate the first one, we need to note two facts: First, that 
R(s, X,a) = Ao) is by assumptions (A3) and (A5) a product of two functions that are 
nonnegative nondecreasing in s and supermodular in (s, a). As such, R preserves all these 
properties. Next, since (A(s, X))~! in nonnegative, constant in s and nondecreasing in X 


while r has increasing differences in (s, a) and X, 


r(s,X,a) r(s’, X,a’) 
us, X) xs", X) 


R(s, X,a) — R(s', X, a’) = 


(r(s, X,a) — r(s’, X,a’)), 


~ Xs, X) 


so for (s, a) > (s’, a’) it is a product of two nonnegative nondecreasing functions of X, thus 
a nondecreasing function itself. This means that R has increasing differences in (s, a) and X. 
Monotonicity, supermodularity, and increasing differences are preserved upon summation, 
so (by (A3)) Rx(s, a) =F(s, X,a) + ae) also has all these properties. 

q(-|s, X,a) for any s Æ so 
d[so], for s = so 


Second, note that Ox (|s, a) = | preserves all the properties 


of q, as: 


(a) ô[so] is stochastically smaller than any other probability distribution over S, and so Qx 
trivially stays stochastically nondecreasing in (s, a). 

(b) Stochastically increasing differences in (s,a) and X are preserved because for 
(s,a) > (So, a0), Ox(-|s, a) = q(-|s, X, a) is stochastically nondecreasing in X, while 
Qx(-|59, ao) is constant. 

(c) Supermodularity in (s, a) in D is trivial, as (so, ao) < (s, a) for any (s,a) Æ (so, ao), 
and so always sup{(so, do), (s, a)} = (s, a) and inf{ (so, ao), (s, a)} = (So, ao). 


Now we can pass to the main part of the proof. 


Claim 1 Let v be a bounded function of s and X, nondecreasing in s and having increasing 
differences in s and X. Then 


w(s, X,a) = È v(s’, X)Ox(s'|s, a) 


s'eS 
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is nondecreasing in s and a, supermodular in (s, a), and has increasing differences in (s, a) 
and X. 


This claim has been shown in [1] as Lemma 3. 


Claim 2 Let v be a bounded function of s and X, nondecreasing in s and having increasing 
differences in s and X. Then 


T(s, X)(v) = ae co a)+ > v(s’, X)Ox(s'|s, J 


ASD s'es 
is nondecreasing in s and has increasing differences in s and X. 
This claim has been shown in [1] as Lemma 4. 
Claim 3 Vž(s) is nondecreasing in s and has increasing differences in s and X. 
By assumption (A1) we can write that for any two bounded functions of (s, X): v, w 
max |T", X)(v) — T'S! (s, X)(w) 


sES,XEA(S) 


<(1— min ISI (s s,X,a max v(s, X) — w(s, X 
~ ( seS.aca XeA(s) T ( ol )) ene ( ) ( d| 


< (1 — po) max |v(s) — w(s)| 
ses 


and so T!’! is a contraction. Since the set of bounded functions of (s, X) which are nonde- 
creasing in s and have increasing differences in s and X is a closed subset of a complete 
metric space of bounded functions from S x A(S) to R, it is also a complete metric space, 
and consequently T'S! has a unique fixed point in this set. 

Now take v2 = 0 and define for k > 0 


Vx(s) = max exe a) + DV NS Ox(s'ls, J ; 


AA s'es 


It is clear that VE(s) = T*(s, X)\(V). Consequently, Vš(s) = limķ>œ VE(s) = 

limzo TES (s, X \(V8) which equals the fixed point of T'SI. This proves that Vš (s) has all 

the desired properties. o 
Next, let us define a correspondence that can be viewed as a best response operator: 


B(X)(s) = atg max. cc a)+ ` VE (s) Qx (s's, J A 


S'ES 
Next, let B(X) and B(X) denote the smallest and the biggest best responses, that is 
B(X)(s) = min B(X)(s), B(X)(s) = max B(X)(s). 


The fact that they are both well defined, as well as their crucial properties, are shown in the 
following lemma. 


Lemma 2 B(X) is nondecreasing in (s, X). Moreover, B(X)(s) and B(X)(s) are well 
defined, nondecreasing in X and, for a fixed X, also nondecreasing in s. 


Y Birkhauser 
Content courtesy of Springer Nature, terms of use apply. Rights reserved. 


Dyn Games Appl (2017) 7:507-529 517 


Proof The proof is based on two results by Topkis. First, define 
f(a,s,X) = Rx(s,a) + $ Vx(s')Ox(s'|s, a). 


s'eS 


By Lemma 1 Vy(s) is nondecreasing in s and has increasing differences in s and X. Next, 
we can use Claim 1 of this lemma to show that this implies that X yeg Vx (5) Ox(s’|s, a) is 
nondecreasing in s, supermodular in (s, a), and has increasing differences in (s, a) and X. 
Since Rx (s, a) also has these properties (which was shown at the beginning of the proof of 
Lemma 1) and as they are preserved under summation, f(a, s, X) is also nondecreasing in 
s, supermodular in (s, a), and has increasing differences in (s, a) and X. Note also that by 
assumption (A2) A(s, X) is nondecreasing in (s, X). Now we can apply Theorem 2.8.1 in 
[36] to obtain the first part of the lemma. The second statement follows from Theorem 2.8.3 


(a) in [36]. 


In the next lemma we come back to the original game model and analyze the properties 
of stationary individual state distributions when a player applies a given stationary strategy. 


Lemma 3 Suppose that the global state of the game is constant and equal to X. Then the 
smallest stationary state distribution corresponding to a stationary strategy 


f € Fo := {g € F : g(s, X) is nondecreasing in X and for any fixed X in s}, 


X(f, X) and the greatest stationary state distribution corresponding to f, X(f, X), are 
nondecreasing functions of f and X on Fo x A(S). 


Proof First, note that a stationary global state Y corresponding to the stationary strategy f 
used by all the players and the fixed global state of the game X must satisfy for every s € S 
the following equation: 


Xy, Y" ACs’, X)q(s|s’, X, a) fals, X) — YSA (s, X) = 0. 
s'ESacA 


Note however that by (A5) à(s, X) does not depend on s. As it is always nonzero, we can 
cancel out all the A terms from the above equation, obtaining 


r = D D YGS, Xa) fa’, X. D 
s'ESacA 


Clearly, by (A4) and the fact that f is nondecreasing, q (-|s', X, f(s’, X)) is stochastically 
nondecreasing in s’ and X, as well as in f, as long as strategies from Fo are applied. 
Now define ¢ : A(S) x A(S) x Fo —> A(S) with equality 


PY, X, f) = DY" q(sis', X, f(s’, X). 
eS 
We will show that this is a nondecreasing function. Let Y <sp Y, f< f and X <sp X. As 
q|s’.X, f(s’, X)) is stochastically nondecreasing in X and f, clearly 
SY walls’, X f(s’, X) < X woal’, X, Fl’, X) (8) 
seS ses 
for any s’ € S and any bounded nondecreasing function w : S —> S. This implies that 


> wig ¥, X, P=) ws) SV gists’, X, Fs", Z) 


ses ses eS 
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=5 7 [Ewa Xf a») >57 p w(s)q(s|s’, X, f(s’, »)| 


S'ES ses s'ES ses 


Now note that since q(-|s’, X, f(s’, X)) is stochastically nondecreasing in s’, the expression 
in brackets is a nondecreasing function of s’, and so since Y <sp Y, the RHS of the last 
inequality is not smaller than 


Dr | moaate.x.so'.ay] -Zerr p 


eS ses ses 
proving that ¢ is nondecreasing. Now we can apply Theorem 3 in [28] to show that for any 
X € A(S), f € Fo, there exists an Y € A(S) such that 
Y=$(Y, X, f). (9) 


Moreover, the greatest and the smallest Y satisfying (9), that is the greatest and the smallest 
stationary distributions corresponding to X and f, X(X, f) and X(X, f), are nondecreasing 
in f. m 


Proof of Theorem 1 Define 
W(X) = XB, X) and W(X) = X(B, X). 


Both functions are nondecreasing in X (as superpositions of functions that are nondecreasing 
by Lemmas 2 and 3 respectively) and defined on a nonempty complete lattice A(S). Thus by 
Tarski’s theorem [34] each of them has a fixed point which clearly defines an equilibrium in 
the game. Note also, that by Lemma 2 equilibrium stationary strategies (B and B respectively) 
are nondecreasing in s and X. 


3.4 Distributed Learning 


In the next part of this section we present a distributed iterative algorithm allowing players 
to learn to play the game. This kind of algorithms are known to exist for some types of 
games, and games with strategic complementarities are known to be one of them. The very 
simple and intuitive algorithm presented below is an adaptation for our game of an algorithm 
presented in [1]. 


Algorithm 1 (Lower Myopic Learning) For each time moment t > 0 repeat the following 
steps: 


1. Every player making his move at time f observes current population state X;. 
2. A player in the individual state s chooses action a; = B(X;)(s). 


The following theorem summarizes main properties of the Lower Myopic Learning Algo- 
rithm. 


Theorem 2 Suppose assumptions (AI—A6) are satisfied. Additionally assume that the initial 
state of the game Xo satisfies the inequality 


Xo Xsp $(Xo, Xo, B(Xo)). (10) 
and that all the players adjust their strategies according to the Lower Myopic Learning 


Algorithm. Then: 
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iy —1. 


a a M 
(a) For every d Oya 2 aqo i SOM ceed 2 


(b) X, is an increasing function of t converging to some X as t — œ, such that (B, X) are 
an equilibrium in the game. 


One lemma will be used in the proof of the above theorem. 


Lemma 4 Suppose that assumptions (Al—A6) are satisfied and that £, X, € A(S), t e Rt 
such that X, Z X. If f is a stationary strategy such that 


F(X s) >i f(E, s)foranys € S, (11) 


then for any s € S the reward from using policy f in model M(X;), Jy,(f, s), converges to 
the reward from using f in M(X), Jg(f, s), as t goes to infinity. 


Proof For any bounded function v : s x X — R, nondecreasing in s, such that 
v(s, X;) > v(s, £) for any s € S (12) 


let us define the operator 


K f(s, X)(v) = Rx(s, f(X, 5)) + >, v(s’, X)Ox(s'|s, f(X, 5). 


S'ES 


It is clear that for any s € S, (11) together with the continuity of r, 7 and à implies the 
following: 


jim, Ry, (s, t(X&, S)) = jim, [Fe Xt, F(X, S)) + 


r(s, £, f(X, s)) 
A(s, X) 


r(s, Xt, f (Xt, 5)) 
A(s, X+) 


=7(s,X, f(X,s)) + = Re(s, f(X,5)). 


Then, also 
Jim D v6", X)Ox, 6's, FX s) = D7 v6", Dgs, FE, s), 
eS eS 


by (11), (12) and the continuity of Q. This obviously implies that 
lim K f(s, X)) = Ky(s, £). 
—>00 


Consequently, by induction the same is true for K k (s, X)(v) with k € N. 

Next note that r, 7 and à are continuous on a compact domain, hence bounded. Let L 
be such that |r (s, X, a)| < L, |r(s, X,a)| < L and à(s, X) < L for any (s, X,a) € D. In 
addition A is by assumption positive, so there also exists a à > O such that A(s, X) > A for 
any (s, X,a) € D. Consequently, |Rx(s,a)| < L + + for any (s, X,a) € D. Further note 
that by (A1) for any X, s, f and v, 


[>| 


im, KG, XV) — Ks, XV 


< (z + =) ad — po) Lt 20 po)! _ a 1) a po) Let], 


Thus, for any € > 0, [limmo KF (s, X)(v) — K% (s, x)(v)| < Š for k big enough, say 
k > ko. Consequently, 


t-+00 m—>= 00 


lim lim K?(s,X)@)— lim K's, XW] 
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< | lim K (s, X,(v) — K®(s, X)(v)| te = e. 
too T f 


Note however that this, in view of the arbitrarity of £ and because both limits on the LHS of 
the above set of inequalities exist, implies 


lim lim KP (s, X,)(v) = lim K7(s, X)(v). 


t—oo moo f 


Note however that by standard dynamic programming arguments, lim-o K F (s, X)(v) 
equals Jx (f, s). Thus we have proved that for every s € S Jx, (f, s) >1-+00 J¢(f, s) 0 


Proof of Theorem 2 First, note that 
Xı Xsp (Xr, Xr, B(X;)). (13) 
is by definition equivalent to 


>) [@%&, Xi, B) — Xf] h(s) = 0 


ses 


for any nondecreasing function h : S — R, and further by the definition of @ and (A5) to 


>, È XP ACs’, XDA CIS, Xr, B(X1)(8')) — XFC, xo h(s)>=0. A4 


seS Ls'esS 
Next, define! 
H(h)(s) = X X; B)h(s), 
ses 
where (as before) / is an arbitrary function from S to R. Then by (1) and (14) 


dH (h) S 
=S $ xih) 


seS 


=> > XP ACs, Xa (sls, Xr, BXD’) — XAG, xo h(s) 
seS Ls’eS 
= 0 
This however means that as long as (13) holds, the global state of the game is increasing 
as time increases. Of course, it also implies that Ara > Aras i=0,1,...,i% — 1 for any 
i+l i 


player a, as B is nondecreasing. 

Next assume that at some time f (13) is violated. Then, since at the beginning of the game 
it was by assumption true, and because of the continuity of the trajectory of X;, there must 
exist a function ho, such that 


S [0X Xi, BY)’ — X} ] hols) = 0. 
ses 
Then, it is easy to see that 


dH (ho) 7 
2 = > Xfho(s) 


ses 


10 Th the remainder of the proof we skip the information that the global state corresponds to all players using 
policy B. 
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=> b XP ACs’, Xi)q(sis’, Xi, BXD’) — XF As, xo ho(s) 
seS Ls’eS 


= 0, 


which implies that when the boundary of the set where (13) is satisfied is reached, the 
trajectory cannot leave the set. 

Next note that since at any time t X; <sp ô[max{ S}, the fact that X; is increasing implies 
that it converges to some ¥ (recall that the stochastic domination ordering is equivalent to 
ordering of CDFs, which, as S is finite, is in turn equivalent to componentwise ordering in 
RIS ly, 

Next, define 


B(X)(s) for X £X 


BUGI = | lim; B(X;)(s) for X = X 


We will now show that ¥ is a stationary distribution corresponding to B. From the definition 
of B and the continuity of à and q we can infer that 


t>0o 
S'ES 


lim b XAG’, Xia (sls, Xr, BXD) — XFC, xo 
= Ñ XAO, Xq(s|s', X, B(X)(s')) — XAS, X). (15) 
S'ES 
On the other hand, since X; —;-.9. æ monotonically, for any s xs — 0, which is 
equivalent to 


t>0o 
S'ES 


lim £ XEAG', Xq(sls’, Xr, BXD) — XSACs, xo =0. 


Combining this with (15) we obtain 

D SAG, DCI, X, B(X)(s')) — XA, X) = 0. 

s'es 
But this, by the definition of ọ and (A5), means that ¥ is a fixed point of ọ¢(, ¥, B), and 
consequently ¥ is a stationary distribution corresponding to B. 


Next, note that under (A6) any vector of actions @ = (as)ses from sets A(s, X) can be 
obtained as a value of a global state-independent policy defined by 


fa(X,s)=as, SES. 


Clearly, each of the policies fg satisfies (11). So does B by its construction. Thus we can use 
Lemma 4 to show the following 


Jx(B,s) = lim Jx,@, s) > lim Jx, (fa, s) = Je (far), 


where the inequality follows from the fact that B(X ) = B(X) for X # X and the fact that 
for each t, B(X;) is a best response to X;. But this proves that B(X) is a best response to ¥, 
as strategies fz cover all the possible actions that a player can use at the global state V. To 
end the proof, note that by the monotonicity of B, 


B(X) = lim B(X;) < B(X). 
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This however implies that B(x ) = B(X), as the latter is by definition the smallest best 
response to 1’. Since the two strategies could only differ at 1’, this means that they are equal 
and that (B, X) is an equilibrium in the game. 


Remark 5 The assumption (10) is both difficult to check and rather restrictive. In [1] to avoid 
this kind of problem the authors start the algorithm by setting the initial state of each player to 
min{S}. This kind of solution seems doubtful. Note that the notion of state of the game or that 
of a player captures the properties of his environment, and as such depends only partially (and 
in a nondeterministic way) on his decisions. It cannot thus be set by a player at the beginning 
of the game. Note however that in our setting this kind of assumption could make more sense 
than in [1]. In our framework min{S} = so, so assuming that the algorithm is initialized by 
setting individual states of all the players to sọ would mean that the game is started before any 
players join it, which could make sense in many practical applications. On the other hand, 
for Xo = ô[so] (10) is trivially satisfied, as it reduces to ô[so] <sp ¢(-|So, ô[so], ao)), which 
is true for any transition probability defined on S. 


Remark 6 In our setting the players join the game at different times. This naturally implies 
that those joining at later stages of the game hardly need any adjustment to their initial 
strategies, because the global state of the game is already very close to ¥ when they appear. 
Consequently, the expected rewards they receive over their lifetime are very close to equi- 
librium payoffs corresponding to the smallest equilibrium in the game. 


3.5 Examples of Application of the Model 


In the remainder of this section we will briefly present some natural applications of our 
framework. Some further ones could be possible, if the sets of states and actions were multi- 
dimensional or the rewards could be negative. Generalizing to these situations is left however 
for further research. 

Research and development race In this game the players are firms choosing their tech- 
nological profile. Let s be the level of technological development of firm’s products and a, 
its investment in research. The transition times for a player are technological breakthroughs 
for his firm. It is obvious that these moments do not come at the same time for each of the 
players, so this corresponds well to our framework. Next, a ‘death’ of a player is naturally 
interpreted as his firm’s bankruptcy. Finally, let r describe his profit minus investment. We 
assume that there is no’. It is natural to assume strategic complementarities between rewards 
for different firms—a higher level of technological development of the entire industry results 
in a higher demand for high-tech products. Also a higher investment in research is required 
if industry is at a higher level of development. Finally, one can argue that a firm with a higher 
technological profile is less likely to get bankrupt. 

Corruption game This is a variant of the game presented in [20]. The players here are 
civil servants who can be in three states: corrupt, honest, excluded from the society. The last 
state can be naturally seen as a (civil) death of a player—ain this state he is not able to receive 
any rewards. A player’s transitions happen when he has to decide on some project. These 
moments are naturally different for different players. His actions describe his willingness to 
change his state. Obviously, a player who wants to be bribed is much more likely to become 
corrupt. Also the possibility of becoming corrupt increases as the society becomes more 
depraved. In corrupt state a player’s rewards are the highest and naturally increase as the 
society becomes more corrupt. Finally, the possibility of death for a player decreases as the 
society becomes more corrupt, because the control is less stringent. Thus, we can argue this 
is a game with strategic complementarities as well. 


Y Birkhauser 
Content courtesy of Springer Nature, terms of use apply. Rights reserved. 


Dyn Games Appl (2017) 7:507-529 523 


Interdependent security A similar model has appeared in [22]. Let us consider a large 
number of computers in a cluster. Each of them is trying to avoid system failure due to 
viruses. Let s describe individual computer’s security level, while a its investment in security. 
The transition times for an individual are moments of malicious attacks against him. A 
‘death’ of a player is the time of system failure. We can assume that r = 1 if the system is 
OK and zero otherwise (a number of different ‘health’ levels with different rewards is also 
possible). Further, let 7 be an individual’s investment in security. As one can immediately 
see, this model fails to satisfy our assumptions, because 7 is negative. We can however argue 
that a weaker version of our assumptions presented in Remark 1 can be satisfied without 
making the model unrealistic. Note, that this game is a natural example of games with 
strategic complementarities, as higher level of security for other computers results in a lower 
probability of infecting any of them. It is also natural to assume that attacks on different 
machines are not coordinated, so the moves of different players are asynchronous, like in our 
framework. 

Charging control for plug-in electric vehicles This model is inspired by the one presented 
in [27]. Let us consider a large population of plug-in electric vehicles. Each of them needs 
to load its battery regularly, but tries to do it as cheap as possible. The problem is that the 
cost of energy may depend on the hour of the day—from the electricity producers’ point 
of view it is best if all the vehicles charge their batteries at the same time during the night 
when the overall energy consumption is relatively low, so they can incur some additional 
cost on the car owners for doing differently. On the other hand, the vehicle whose battery is 
empty needs to be recharged immediately, and otherwise it will decrease its owner’s profits 
from using it. In our model each player tries to maximize his profits from use of the car 
minus the charging costs over the lifetime of the vehicle. There are two possible actions: 
a = 1 (not to charge) and a = 2 (to charge) and a number of states denoting the battery 
charge levels (plus artificial state so < 0 and action aọ = 0 denoting the breakdown of the 
car). The transition times can be viewed as moments when the battery of a given vehicle 
can be charged, so we can assume A is constant. The battery state at each of transition times 
decreases by one with some positive probability, decreases to sọ with some smaller positive 
probability and remains constant with the remaining one unless the user decides to charge 
the battery—then it increases to the maximum battery level Smax. The immediate reward is 
of the form r (s, X,a, Z) = R1{s > 0}, where R is the reward from the exploitation of the 
vehicle, while 7 is defined as 


F(s, X, a, Z) = 1a = 2}[p(s — Smax) — cE [(a — Z)’]], 


where p is the nominal energy price and c is the additional cost for deviating from the 
average policy of the population. Again, this model fails to satisfy assumptions (A1-A5) (F 
is nonpositive and it depends on Z), but it can be directly checked that for R big enough 
it satisfies all the assumptions of the model combining its two generalizations described in 
Remarks 1 and 2. 


Remark 7 It is worth noting that the last model is one of many models considered in the 
engineering literature where the so-called crowd-seeking behavior is beneficiary for the 
players. Strategic complementarity between states or actions of the players seems a perfect 
mathematical description of this kind of situation. It turns out however that engineering 
applications of our model are limited for several reasons. The first one is that typically 
engineering models consider costs, not rewards, so the positivity assumption appearing in 
(A3) (very important, since we consider a total reward model) fails. The second one is that 
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we also assume that r and F are nondecreasing in s, which often is not satisfied. One should 
however note that this monotonicity assumption is crucial in proving that the aggregate utility 
of each player preserves the strategic complementarity structure, so we cannot easily get rid 
of it. Finally, the problems can be caused by the fact that we assume that the state space is 
a sublattice of R (which is important because for S C R”, n > 2, the set A(S) does not 
preserve the lattice structure). 


4 Relation to Games with Finitely Many Players 


In this section we provide a result which links the model with a continuum of players studied 
above with related models with finite numbers of players. In turn, this result provides an 
explanation to the use of the Kurtz dynamics (1) for the global state of the game. To begin 
with, we need to introduce the finite models we will discuss below. Let denote the game 
with continuum of players defined in Sect. 2. Then F, will denote its counterpart with n 
players played in exactly the same way as game IF and such that: 


(a) The global state of the game at time t is denoted by X;[n] and defined by the formula 
X?[n] = #{a e {1,...,n}: 5% =s}. 


Next, the normalized global state of the game at time ¢ is denoted by X;[n] and defined 
as 


=" 1 
X,[n] = -Xj [n]. 
n 


(b) All the functions defining the model are defined with respect to the normalized state, and 
so: 


r[n](st, Xi[n], ar) = r (st, X,[n], a),  F[n](s;, X[m], ar) := F (st, X,[n], ar), 
qla] Cisr, Xi[n], ar) = q Clsr, Xin], at), Alm] (s;, Xn) == Alst, Xin). 
Next define the subset of strategies we shall concentrate on in this section. 
F. ={f € F : f(s, X) does not depend on X}. 


The following result will link the game T with ‘sufficiently close’ games T}. 


Theorem 3 Suppose assumption (A1) holds and take some ©, € > 0. Then there exists an 
N €N such that for any n > N the expected reward of player a from playing policy g € Fe 
against f € F, played by all the other players in the game T, differs from his expected 
reward when he plays g against f in game T by at most €. 


Proof First recall that r,7 and A are continuous on a compact domain, hence bounded. Let 
L be such that |r (s, X,a)| < L, |r(s, X,a)| < L and A(s, X) < L for any (s, X,a) € D. 
In addition, note that à is by assumption positive, so there also exists a A > O such that 
à(s, X) > à for any (s, X,a) € D. 

Next, note that under assumption (A1) absolute value of the sum of rewards received by 
(any given) player œ from his kth change of state on 


ie—1 Te 
i+1 
apa (Fete xrrato [™ raip Xap )| 


i=k i 


® Birkhauser 
Content courtesy of Springer Nature, terms of use apply. Rights reserved. 


Dyn Games Appl (2017) 7:507-529 525 


can be bounded by 


L 2 Plie >l 5 S| — DPE i(|S]|— 1) + kli k 
( +5) lie > Dm! |—1)Plie > 11S] — 1) + te > k] 


i=0 


L i_LŁQ+!) B 
< (1+7) a- mwl" JEa -w = Ada- plr l, (16) 


It is then immediate that there exists a kẹ such that 


ie—l Te 
i+l E 
E T(sŽa, Xre, a% r(Sta, Xi, Ata) dt i 
¥ (re Tare), a 6 


i=ke 


i 


The same bound will apply to every T}. 
Then, since 1,” is for any @ stochastically dominated by an exponentially distributed 
random variable with intensity À, Tia for any T > 0 we can conclude as follows: 


ke ke 
a > r se [Sat > r| 
i=0 i=0 


Since Tř are for different i independent, we can assume the same about A . Then zi P 
is Gamma-distributed with fixed parameters kẹ and A, thus the probability it is greater than 
T converges to 0 as T goes to infinity. Thus, there exists a Te > 0 such that 


|S > n|< am (17) 
6La+D” 


Consequently, by (16) the expected reward received by any player either in model T or any 
of models T, from time T; on can be bigger than that received until the k,th jump of his 
individual state by no more than 


LA+1) Apo E 
Ee = r 
àpo 6L(A+1) 6 


which implies that the expected reward received by player œ until time min{Te, Ty €} in any 
of these models differs from the expected reward over his lifetime by at most 5 

Now note that since f € Fe and by Lipschitz continuity and boundedness oh q ‘and A, all of 
the intensities X yes XAS, X)q(s|s’, X, f(s’, X)), —X*°A(s, X) are Lipschitz-continuous 
and bounded functions of X, and so by the Kurtz theorem (see Theorem 5.3 in [32]) if all the 
players except w are using policy f, 


P[ sup |X,[n] — X;| > 6] < DeW"F) 


O<t<T. 


for some positive constant D and a function F satisfying lim,\.o rg € (0, 00). By this last 
property, the probability bounded above converges to zero as n sore to infinity at rate of e~”, 


so for n large enough, say n > N5, 


> APo 
P[ sup |X;[n] — X;| = 


6] < ————_ 18 
oe Sara oo 


for any given ô > 0. 
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Further, notice that r(s, X, g(s, X)) and F(s, X, g(s, X)) are continuous on a compact 
domain, which by Heine’s theorem implies that they are uniformly continuous, so we can 
find a 5 > 0 such that for any X, X’ € A(S), 


IX — X'| <8 => sup|r(s, X, g(s, X)) —r(s, X’, g(s, X)| < — (19) 
ses AT; 
and s 
|X — X'I < 8 = sup Fs, X, g(s, X)) — F(s, X’, g(s, X)| < Ê. (20) 
ses 


Then, let us fix a trajectory of X,[n] and define 


R(s, t) = Alt < Te] Fe Xr, (s, Xr) 


Te pmin(T,(T,—t)) 
+ f i r(s, Xt4us es, X)) due" XT A(s, X,) dT |, 
0 0 


O(s', Bis, t) = 1 q (s's, Xi, g(s, Xe >T A(s, X) dT, 
B 


where B is any Borel set on R*, and analogously R[n](s, t) and Q[n](s’, BIs, t) by replacing 
X, in the above formulas with X;[n] whenever SUP; €[0, Ts] |X;[n] — X;| < 8 (and doing 
nothing otherwise). Note that measurability of the functions integrated in the above formulas 
is guaranteed by their continuity. Also in the cases of R[n](s, t) and Q[n](s’, B|s, t), even 
though the functions integrated there are not continuous, their domain S x R* can be divided 
into a countable number of subsets of form S x [t, t) where they are continuous. These sets 
are obviously Borel, which guarantees the measurability of the functions. 

If we combine (19-20) with the definitions of R[n] and Q[n], we obtain that for n large 
enough 

|R(s, t) — R[n](s, t)| < 2 G + ah) = 6, (21) 

which means that R[n] converges uniformly to R as n goes to infinity. Moreover, uniformly 
both in state (s, £) and the trajectory X,[n]. Similarly, we can show that the density appear- 
ing in the definition of Q[n] converges uniformly to that appearing in the definition of Q. 
Note however that uniform convergence of densities together with uniform convergence and 
boundedness of rewards implies that 


ke 
RIĪn](so, 0) + >> yf, R[n](s, uw) O[n}*(s, du|so, 0) 


k=1 seS 


ke 
= R(so, 0) + >. R(s, u) O*(s, dulso, 0), (22) 


k=1 seS 


where the convergence is uniform with respect to both so and X;[n]. Clearly, R and Q were 
constructed in such a way that the RHS of the above equation equals the expected reward 
received by player a until time min{T;, Te} in model T. On the other hand, if we take the 


expected value of the LHS over all trajectories of X [n], (16) and (18) imply that forn > No 


it will differ by at most 
LA+1 À 
Gopi) AP 2% (23) 
App 3LA+H 3 


Y Birkhauser 
Content courtesy of Springer Nature, terms of use apply. Rights reserved. 


Dyn Games Appl (2017) 7:507-529 527 


from the expected reward received by player «œ until time min{T;, T% } in model Ty. 


Now, if we take n big enough, say bigger than Nj, the supremum over all sọ and all X;[n] 
of the two sides of (22) will differ by at most S> This however, together with (23) and the fact 
that the expected reward until time min{7;, 7;"} differs from that over lifetime of a player 
by at most $, will imply that the reward received by player œ in model T differs from that 
received in models T, for!! n > max{Ns, Ni} by at most 


E E E 


which ends the proof. o 


Remark 8 The restriction of strategies to the set F, may seem quite strong but, since at any 
fixed global state X a stationary strategy reduces to a mapping from S to A, it is enough to 
show the existence of approximate equilibria defined in a way similar to that equilibria are 
defined for mean-field game, which is obviously much weaker than how Nash equilibria are 
defined. Just this is done in a corollary below. On the other hand, note that the result presented 
in Theorem 3 can be easily generalized (in the sense that the proof will follow along the same 
lines as here) to Lipschitz-continuous randomized stationary strategies. However, as we 
limited our considerations to pure strategies in most of the paper, we have decided to present 
this result in this weaker form. 


To formulate the next result, which will link equilibria of mean-field game I with approx- 
imate equilibria of games I’,,, we need to introduce the following concept. 


Definition 1 A stationary strategy f and a measure u € A(S) are in e-weak equilibrium 
in the semi-Markov n-person counterpart of mean-field game with total reward T, if u is a 
stationary global state corresponding to f and for every other stationary strategy g € F, 


Jf, f,p) = I(g, f, p) — e, 


where p = q(-|so, u, ao) is the distribution of individual states of new-born players when 
the global state is ju. 


The following result is an immediate consequence of Theorems | and 3. 


Corollary 1 Suppose that the total reward mean-field game T satisfies assumptions (Al— 
A6). Then for n big enough (B(X), X) and (B(X), X) are e-weak equilibria in n-person 
counterparts of I, Tn. 


5 Conclusions 


In our paper we presented a model of mean-field game where each of infinitely many players 
controls his own continuous time Markov chain of private states, but the global state follows 
an ordinary differential equation. We have made two main contributions here: the first one is 
the generalization of this type of games to a novel model where players, instead of maximizing 
some payoff accumulated over the entire game, maximize the reward obtained during their 
lifetime, which may be different for different players. Since any dead player can be replaced 
after some time by a newborn one, after some time stationary behavior of the system is 
obtained, which is then used to define a mean-field-type equilibrium. We have provided an 


11 Also with ô selected in a way guaranteing that the difference between the sides of (22) is below 5 
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approximation result linking this new model with its n-player counterparts for n approaching 
infinity under some very mild assumptions. 

The second main contribution of the paper is an equilibrium-existence result for mean-field 
game model discussed in this paper under some strategic complementarity conditions. These 
assumptions differ significantly from those discussed in the mean-field game literature, as no 
conditions based on convexity or strict monotonicity of the functions defining our model are 
required. Instead, properties implying that an increase in states of most of the players makes 
increase in any individual’s state profitable and that an increase in one’s state makes higher 
actions more profitable are assumed. This allows us to obtain the existence of equilibria 
in strategies with some monotonicity properties as well as the convergence of a myopic 
learning procedure. What is important, it turns out that many real-life applications of mean- 
field models satisfy our strategic complementarity assumptions. However, the applications 
of our contributions are limited especially due to two of them: positivity of reward functions 
and one-dimensional state space. It will be very interesting to see a generalization of our 
results getting rid of these two assumptions. 


Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 Interna- 
tional License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and 
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, 
provide a link to the Creative Commons license, and indicate if changes were made. 
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